Abstract: Due to advent of new technologies, the amount of data produced by mankind has already reached a zetta-byte level. New devices, businesses and social networking sites are a major source for the production of such large amount of data. This data is really huge and collectively called by a well known term "Big data". Due to such huge amount of data being available it becomes very difficult to perform effective analysis using the currently available traditional techniques. From the literature survey it is found that there are a total of 39 tools available for analysis and processing of big data. Survey reveals that but the most influential and established tool for analyzing big data is Apache Hadoop which is an open source framework written in java that allows parallel processing across clusters of computers using basic programming techniques. This paper introduces apache Hadoop, its framework, installation and how it uses map reduce and cluster programming to capture, analyse and process big data."

Keywords: big data, Hadoop, mapreduce, HDFS, Clustered.